Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
نویسنده
چکیده
In this paper I present a Master’s thesis proposal in syntax-based Statistical Machine Translation. I propose to build discriminative SMT models using both tree-to-string and tree-to-tree approaches. Translation and language models will be represented mainly through the use of Tree Automata and Tree Transducers. These formalisms have important representational properties that makes them well-suited for syntax modeling. I also present an experiment plan to evaluate these models through the use of a parallel corpus written in English and Brazilian Portuguese.
منابع مشابه
Extended Tree Transducers in Natural Language Processing
Tree transducers are finite-state devices computing relations on trees. Their study was initiated by Thatcher (1970) and Rounds (1970), who established the classical top-down tree transducers that process the input tree from the root towards the leaves. Shortly afterwards, Baker (1973) introduced the bottom-up tree transducers that process the input tree from the leaves towards the root in anal...
متن کاملProceedings of the 10 th European Workshop on Natural Language Generation ( ENLG - 05 )
Probabilistic finite-state methods have been very successful for natural language processing (NLP) problems like tagging, entity identification, and transliteration. These methods have also been packaged in very useful software toolkits. However, they are not so good for attacking problems with large-scale reordering (translation, generation, paraphrasing, question answering, etc.) and sensitiv...
متن کاملEfficient Inference through Cascades of Weighted Tree Transducers
Weighted tree transducers have been proposed as useful formal models for representing syntactic natural language processing applications, but there has been little description of inference algorithms for these automata beyond formal foundations. We give a detailed description of algorithms for application of cascades of weighted tree transducers to weighted tree acceptors, connecting formal the...
متن کاملTree Transducers, Machine Translation, and Cross-Language Divergences
Tree transducers are formal automata that transform trees into other trees. Many varieties of tree transducers have been explored in the automata theory literature, and more recently, in the machine translation literature. In this paper I review T and xT transducers, situate them among related formalisms, and show how they can be used to implement rules for machine translation systems that cove...
متن کاملDiscontinuous Statistical Machine Translation with Target-Side Dependency Syntax
For several languages only potentially non-projective dependency parses are readily available. Projectivizing the parses and utilizing them in syntax-based translation systems often yields particularly bad translation results indicating that those translation models cannot properly utilize such information. We demonstrate that our system based on multi bottom-up tree transducers, which can nati...
متن کامل